Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

12 job splitters #45

Merged
merged 27 commits into from
Mar 11, 2024
Merged

12 job splitters #45

merged 27 commits into from
Mar 11, 2024

Conversation

GriffinBabe
Copy link
Collaborator

@GriffinBabe GriffinBabe commented Feb 12, 2024

Closes #11 and #12

  • Added job splitter using h3 library. Adding the h3 library and using it directly is more lightweight in terms of package and drastically more computationally efficient than having a pre-generated dataset containing the h3 grid. Especially when it comes to low resolutions.
  • Added job manager, extending MultiBackendJobManager. The job manager also implements post-job actions that are computed on other threads to keep the status update smooth.

TODO before merging:

  • Polish and add a demo notebook to the branch

@GriffinBabe GriffinBabe self-assigned this Feb 12, 2024
@GriffinBabe GriffinBabe added the enhancement New feature or request label Feb 12, 2024
@GriffinBabe
Copy link
Collaborator Author

@soxofaan I'm confused, there is a lint check that fails on black but the black check is disabled in the workflow file? I also tried fixing the files with black but doesn't help

@GriffinBabe
Copy link
Collaborator Author

@soxofaan Do you think I should be using multiprocessing instead of multithreading? In case GFMAP is running on a VM with multiple CPUs that would really accelerate possibly heavy post-job actions

@soxofaan
Copy link
Member

@soxofaan I'm confused, there is a lint check that fails on black but the black check is disabled in the workflow file?

That's indeed weird. What might be the case is Gihub runs the workflow from main (instead of your branch) because you are making a pull request targeting main.

@soxofaan
Copy link
Member

Do you think I should be using multiprocessing instead of multithreading? In case GFMAP is running on a VM with multiple CPUs that would really accelerate possibly heavy post-job actions

If I remember correctly, threading and multiprocessing have a common API, so I guess it's most future proof to stay within this common API, so that you can switch when one approach doesn't cut it

@GriffinBabe
Copy link
Collaborator Author

@kvantricht also corresponds to #43

@GriffinBabe GriffinBabe marked this pull request as ready for review March 7, 2024 12:38
@GriffinBabe
Copy link
Collaborator Author

Problem remaining with the "to_scl_dilation_mask", except that everything should be ready for merge

@kvantricht
Copy link
Collaborator

Problem remaining with the "to_scl_dilation_mask", except that everything should be ready for merge

is that something that needs to be fixed before merging or do we create another issue for that? What exactly was the problem?

Copy link
Collaborator

@kvantricht kvantricht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge work done, nice! I added some comments/questions. Don't entirely get the STAC yet and how much it's tuned now to WorldCereal. Also band names still need to be updated there. In the end is this just the example belonging to Sentinel-2 extraction for WorldCereal?

What I didn't check but what's important as you know is proper testing. I trust you've been writing the proper tests along the way.

src/openeo_gfmap/manager/job_splitters.py Outdated Show resolved Hide resolved
src/openeo_gfmap/stac/constants.py Outdated Show resolved Hide resolved
src/openeo_gfmap/stac/constants.py Outdated Show resolved Hide resolved
src/openeo_gfmap/stac/constants.py Show resolved Hide resolved
src/openeo_gfmap/stac/constants.py Show resolved Hide resolved
src/openeo_gfmap/stac/constants.py Outdated Show resolved Hide resolved
@kvantricht kvantricht self-requested a review March 8, 2024 13:45
Copy link
Collaborator

@kvantricht kvantricht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve for now. After Monday discussion we can merge this and add remaining issues if needed.

@GriffinBabe
Copy link
Collaborator Author

Created issue for to_scl_dilation_mask and for the STAC AssetDefinition

@GriffinBabe GriffinBabe merged commit 63541e9 into main Mar 11, 2024
2 checks passed
@GriffinBabe GriffinBabe deleted the 12-job-splitters branch March 11, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement post-job actions in GFMAP (Rasterization purpose)
4 participants